Improve ConfigNode leader warm-up before serving#17821
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17821 +/- ##
============================================
+ Coverage 40.54% 40.69% +0.14%
+ Complexity 2622 2621 -1
============================================
Files 5244 5244
Lines 362367 362567 +200
Branches 46651 46678 +27
============================================
+ Hits 146938 147552 +614
+ Misses 215429 215015 -414 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Caideyipi
left a comment
There was a problem hiding this comment.
I reviewed the warm-up changes on a57680d2542. I think there are a few issues that should be fixed before merge:
-
AINode treats the new warm-up status as a hard failure.
ConfigManager.registerAINode()now returnsCONFIG_NODE_LEADER_WARMING_UPwhileconfirmLeader()is warming up, but the Python AINode client only treatsREDIRECTION_RECOMMENDas retryable in_update_config_node_leader().node_register()/node_restart()then callverify_success()and raise on status1014, so an AINode can fail startup if it hits the leader during warm-up. Please add the new code to the AINode constants and retry handling paths. -
Non-seed ConfigNode registration has the same gap.
registerConfigNode()can now returnCONFIG_NODE_LEADER_WARMING_UP, butConfigNode.sendRegisterConfigNodeRequest()only retries success/redirection/internal-retry statuses and throwsStartupExceptionfor anything else. A ConfigNode joining during leader warm-up can fail immediately instead of waiting and retrying. -
The async leader-service startup has a stepdown race.
notifyLeaderReady()now submitsstartLeaderServicesAfterLoadReady()asynchronously. That task checksisLeaderReady()only once before starting leader-only services and settingleaderServicesReady=true. IfnotifyNotLeader()runs after that check but before/during service startup, the old task can re-enable services after cleanup. Please guard this with a leader epoch/cancellation token, and re-check before settingleaderServicesReady. -
The DataNode register retry budget is too tight for the 30s warm-up tolerance. On
CONFIG_NODE_LEADER_WARMING_UP,updateConfigNodeLeader()sleeps 2s and returns retryable, whileregisterDataNode()has 15 attempts. The final request can still happen before the 30s tolerance expires, then sleep and exit without one post-tolerance attempt. A deadline-based retry or a larger retry budget would avoid this edge case.
There was a problem hiding this comment.
Pull request overview
This PR improves ConfigNode leader “warm-up” semantics so DataNodes avoid premature redirection during leader transitions, and ConfigNode serving is gated on initial heartbeat sampling readiness.
Changes:
- Add a dedicated
CONFIG_NODE_LEADER_WARMING_UPstatus and have DataNodes wait/retry the current leader during warm-up. - Introduce
LoadManager.isLoadReady()and a 30s tolerance window to require first heartbeat coverage (ConfigNode/DataNode) before considering load services ready. - Track consensus-group heartbeat sampling coverage and add tests for warm-up readiness behavior.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/protocol/client/ConfigNodeClient.java | Treat CONFIG_NODE_LEADER_WARMING_UP as a wait-and-retry instead of redirection. |
| iotdb-core/confignode/src/test/java/org/apache/iotdb/confignode/manager/load/LoadManagerTest.java | Add tests validating load warm-up readiness criteria and 30s tolerance behavior. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/LoadManager.java | Add load readiness state machine (isLoadReady, reason strings, tolerance window). |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/LoadCache.java | Track consensus-group sampled nodes; add node-heartbeat unready reasons; cache “unreported” samples. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/consensus/ConsensusGroupCache.java | Always update consensus stats from last sample (including “unready leader”). |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/AbstractLoadCache.java | Add hasHeartbeatSample() helper. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/consensus/ConsensusManager.java | Gate leader confirmation on consensus-ready + leader-services-ready + load-ready; return warming-up status. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/consensus/statemachine/ConfigRegionStateMachine.java | Track leaderServicesReady and start load services before leader services. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/client/async/handlers/heartbeat/DataNodeHeartbeatHandler.java | Improve null-safety and cache consensus/region samples; add missing-region sampling on partial reports. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/client/async/handlers/heartbeat/ConfigNodeHeartbeatHandler.java | On error, force-update node cache to Unknown (no connection-broken check). |
| iotdb-client/service-rpc/src/main/java/org/apache/iotdb/rpc/TSStatusCode.java | Add CONFIG_NODE_LEADER_WARMING_UP(1014). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@Caideyipi Thanks for the detailed review. Fixed in the latest commits, especially f54b484.
I also cleaned up the follow-up warm-up sampling concerns: removed the unreported DataNode Region heartbeat chain, removed the extra DataNodeHeartbeatHandler region-group argument, and kept consensus sampling to only cache leader samples when the DataNode reports leader=true with a consensus logical timestamp. |
Caideyipi
left a comment
There was a problem hiding this comment.
I still see one remaining stepdown race that I think should be fixed before merge.
ConfigRegionStateMachine.submitIfLeaderServicesEpochCurrent() checks the epoch only before invoking task.run(). If an async task passes that check, then notifyNotLeader() runs and finishes cleanup, the old task can resume and start leader-only services such as startCQScheduler(), startPipeMetaSync(), startPipeHeartbeat(), or startSubscriptionMetaSync() after the node is no longer leader.
The main startup path is guarded by leaderServicesLock, but these submitted tasks are not serialized with cleanup. Please either run the task under leaderServicesLock and re-check isCurrentLeaderServicesEpoch(epoch) inside the lock, or pass a cancellation/epoch guard into the individual service start paths.
|
@Caideyipi Good catch — thanks. Fixed in e074337 by reworking the leader-services lifecycle so this race can no longer happen. The root cause was that
Within a single become-leader epoch, load services still start first (for warm-up), then the remaining independent services start in parallel on a cached pool and are joined before the epoch is marked ready. So the check-then-run gap you pointed out is closed both by the single-thread serialization and by the epoch re-check inside the lock. |
Refactor ConfigRegionStateMachine so leader become/resign transitions are strictly serial. All transitions (notifyLeaderReady / notifyNotLeader / notifyLeaderChanged) are submitted to a single-thread transition executor, which is the barrier that keeps epochs serial: one transition's orchestration runs to completion before the next begins. Within a become-leader epoch, load services start first to warm up as early as possible, then the remaining independent leader services start in parallel on a cached pool and are joined before the epoch is marked ready. The epoch is bumped eagerly on resign so an in-flight startup detects it is stale and bails out before re-enabling services after cleanup. This removes the giant lock-wrapped startLeaderServices method and the per-task submitIfLeaderServicesEpochCurrent helper; leaderServicesLock now only guards the (epoch, ready) pair.
7408a91 to
5c69640
Compare
This reverts commit 5c69640.
|



Summary
Tests